Computing Lexical Cohesion as a Tool for Text Analysis

نویسنده

  • Hideki Kozima
چکیده

Recognizing coherent structure of a text is an essential task in natural language understanding. It is necessary, for example, to resolve anaphora, ellipsis, and ambiguity. One of the dominant factors of coherence of the text structure is lexical cohesion, namely the dependency relationship between words based on associative relations in common knowledge. This thesis proposes an objective and computationally feasible method for measuring lexical cohesion, especially semantic relations, between words. Lexical cohesion between words is computed on a semantic network constructed systematically from a subset of an ordinary English dictionary. Spreading activation on the semantic network analyses the meaning of a word into a 2,851-dimensional semantic space and computes the strength of lexical cohesion between any two words in the dictionary. As an evaluation of the measurement of lexical cohesion, this thesis then presents a quantitative indicator, Lexical Cohesion Pro le (LCP), for segmenting narratives into scenes, the smallest domain in which text coherence can be de ned. LCP is a record of the density of lexical cohesion of words in a window (of 51 words long, in an example) that moves forwards word by word on the text. Hills and valleys in a graph of LCP plotted against word position indicate alternation of scenes in the text. A psychological experiment shows that LCP correlates closely with the human judgements. The evaluation through the text-level application reveals that the proposed measurement of lexical cohesion works well as an indicator of coherent structure of a text. The measurement of lexical cohesion provides semantic information for text analysis. The segmentation scheme provides the frame work for recognizing coherent text structure. Both can be applied to various studies in a broad range of elds in natural language processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WordNet for Lexical Cohesion Analysis

This paper describes an approach to the analysis of lexical cohesion using WordNet. The approach automatically annotates texts with potential cohesive ties, and supports various thesaurus based and text based search facilities as well as different views on the annotated texts. The purpose is to be able to investigate large amounts of text in order to get a clearer idea to what extent semantic r...

متن کامل

Term Relationships and their Contribution to Text Semantics and Information Literacy through Lexical Cohesion

An analysis of linguistic approaches to determining the lexical cohesion in text reveals differences in the types of lexical semantic relations (term relationships) that contribute to the continuity of lexical meaning in the text. Differences were also found in how these lexical relations join words together, sometimes with grammatical relations, to form larger groups of related words that some...

متن کامل

Disunity in Cohesion: How Purpose Affects Methods and Results When AnalyzingLexical Cohesion

Lexical Cohesion is a commonly studied linguistic feature as it is easily identified from the surface of a text. However, the purposes for studying lexical cohesion are varied, and each purpose requires different methods. This study analyzes two short movie review texts for four different research purposes using lexical cohesion: text evaluation, text segmentation, text summarization, and text ...

متن کامل

Computational assessment of lexical differences in L1 and L2 writing

The purpose of this paper is to provide a detailed analysis of how lexical differences related to cohesion and connectionist models can distinguish first language (L1) writers of English from second language (L2) writers of English. Key to this analysis is the use of the computational tool Coh-Metrix, which measures cohesion and text difficulty at various levels of language, discourse, and conc...

متن کامل

Lexical Cohesion and Literariness in Malcolm X's " The Ballot or the Bullet"

This paper unearths the contribution of lexical cohesion to the textuality and overall meaning of Malcolm X’s speech 'The Ballot or the Bullet'. Drawing on Halliday and Hasan’s (1976) and Hoey’s (1991) theory of cohesion, specifically lexical   cohesion, whose main thrust is the role of lexical items in not only contributing to meaning but also serving as cohesive ties, the paper discusses how ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993